Overview

Dataset statistics

Number of variables10
Number of observations2243791
Missing cells5339117
Missing cells (%)23.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory171.2 MiB
Average record size in memory80.0 B

Variable types

NUM5
CAT3
UNSUPPORTED2

Warnings

Nature culture speciale has a high cardinality: 118 distinct values High cardinality
Surface reelle bati has 983859 (43.8%) missing values Missing
Nombre pieces principales has 983859 (43.8%) missing values Missing
Nature culture has 617126 (27.5%) missing values Missing
Nature culture speciale has 2137147 (95.2%) missing values Missing
Surface terrain has 617126 (27.5%) missing values Missing
Valeur fonciere is highly skewed (γ1 = 119.7062575) Skewed
Surface reelle bati is highly skewed (γ1 = 184.2943118) Skewed
Surface terrain is highly skewed (γ1 = 30.04295042) Skewed
df_index has unique values Unique
Code postal is an unsupported type, check if it needs cleaning or further analysis Unsupported
Code type local is an unsupported type, check if it needs cleaning or further analysis Unsupported
Surface reelle bati has 282155 (12.6%) zeros Zeros
Nombre pieces principales has 368414 (16.4%) zeros Zeros

Reproduction

Analysis started2020-10-07 10:14:01.381535
Analysis finished2020-10-07 10:15:37.595767
Duration1 minute and 36.21 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct2243791
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1240083.201
Minimum0
Maximum2535790
Zeros1
Zeros (%)< 0.1%
Memory size17.1 MiB
2020-10-07T12:15:39.302809image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile121050.5
Q1608478.5
median1238468
Q31856156.5
95-th percentile2398652.5
Maximum2535790
Range2535790
Interquartile range (IQR)1247678

Descriptive statistics

Standard deviation724137.6734
Coefficient of variation (CV)0.5839428134
Kurtosis-1.173666804
Mean1240083.201
Median Absolute Deviation (MAD)623788
Skewness0.03194925991
Sum2.782487526e+12
Variance5.2437537e+11
MonotocityStrictly increasing
2020-10-07T12:15:39.540653image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20471< 0.1%
 
15114991< 0.1%
 
15565611< 0.1%
 
15586081< 0.1%
 
15196951< 0.1%
 
15217421< 0.1%
 
15155971< 0.1%
 
15176441< 0.1%
 
15135461< 0.1%
 
13291841< 0.1%
 
Other values (2243781)2243781> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
25357901< 0.1%
 
25357891< 0.1%
 
25357881< 0.1%
 
25357861< 0.1%
 
25357851< 0.1%
 

Nature mutation
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size17.1 MiB
Vente
2183142 
Echange
 
26061
Vente en l'état futur d'achèvement
 
22083
Vente terrain à bâtir
 
7495
Adjudication
 
4583
ValueCountFrequency (%) 
Vente218314297.3%
 
Echange260611.2%
 
Vente en l'état futur d'achèvement220831.0%
 
Vente terrain à bâtir74950.3%
 
Adjudication45830.2%
 
Expropriation427< 0.1%
 
2020-10-07T12:15:39.755545image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-07T12:15:39.878460image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:40.045384image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length34
Median length5
Mean length5.377907746
Min length5

Valeur fonciere
Real number (ℝ≥0)

SKEWED

Distinct99365
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean508679.8997
Minimum0.01
Maximum2086000000
Zeros0
Zeros (%)0.0%
Memory size17.1 MiB
2020-10-07T12:15:40.288240image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile2300
Q151960
median135000
Q3247000
95-th percentile800000
Maximum2086000000
Range2086000000
Interquartile range (IQR)195040

Descriptive statistics

Standard deviation5645992.665
Coefficient of variation (CV)11.09930365
Kurtosis32045.88504
Mean508679.8997
Median Absolute Deviation (MAD)93000
Skewness119.7062575
Sum1.141371381e+12
Variance3.187723317e+13
MonotocityNot monotonic
2020-10-07T12:15:40.512098image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
100000209610.9%
 
150000203370.9%
 
120000190120.8%
 
80000180280.8%
 
130000170950.8%
 
110000169210.8%
 
50000166680.7%
 
140000163360.7%
 
1162700.7%
 
200000161410.7%
 
Other values (99355)206602292.1%
 
ValueCountFrequency (%) 
0.012< 0.1%
 
0.15122< 0.1%
 
0.163< 0.1%
 
0.187< 0.1%
 
0.192< 0.1%
 
ValueCountFrequency (%) 
20860000002< 0.1%
 
17500000003< 0.1%
 
69018675024< 0.1%
 
6129904606< 0.1%
 
4000000001< 0.1%
 

Code postal
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size17.1 MiB

Code type local
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size17.1 MiB

Surface reelle bati
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS

Distinct3835
Distinct (%)0.3%
Missing983859
Missing (%)43.8%
Infinite0
Infinite (%)0.0%
Mean90.83265287
Minimum0
Maximum312962
Zeros282155
Zeros (%)12.6%
Memory size17.1 MiB
2020-10-07T12:15:40.730991image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q120
median63
Q396
95-th percentile174
Maximum312962
Range312962
Interquartile range (IQR)76

Descriptive statistics

Standard deviation900.2078557
Coefficient of variation (CV)9.910619444
Kurtosis47314.29444
Mean90.83265287
Median Absolute Deviation (MAD)37
Skewness184.2943118
Sum114442966
Variance810374.1835
MonotocityNot monotonic
2020-10-07T12:15:40.946867image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
028215512.6%
 
80196610.9%
 
90178230.8%
 
60177560.8%
 
70173890.8%
 
100153310.7%
 
50144150.6%
 
65130560.6%
 
40126730.6%
 
75119670.5%
 
Other values (3825)83770637.3%
 
(Missing)98385943.8%
 
ValueCountFrequency (%) 
028215512.6%
 
1260< 0.1%
 
2195< 0.1%
 
3197< 0.1%
 
4132< 0.1%
 
ValueCountFrequency (%) 
3129622< 0.1%
 
2400002< 0.1%
 
2150002< 0.1%
 
2121202< 0.1%
 
1528566< 0.1%
 

Nombre pieces principales
Real number (ℝ≥0)

MISSING
ZEROS

Distinct40
Distinct (%)< 0.1%
Missing983859
Missing (%)43.8%
Infinite0
Infinite (%)0.0%
Mean2.500972275
Minimum0
Maximum67
Zeros368414
Zeros (%)16.4%
Memory size17.1 MiB
2020-10-07T12:15:41.168721image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median3
Q34
95-th percentile6
Maximum67
Range67
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.099390297
Coefficient of variation (CV)0.8394296565
Kurtosis2.720708602
Mean2.500972275
Median Absolute Deviation (MAD)2
Skewness0.5417265369
Sum3151055
Variance4.407439621
MonotocityNot monotonic
2020-10-07T12:15:41.700436image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%) 
036841416.4%
 
42201219.8%
 
32103439.4%
 
21528196.8%
 
51353066.0%
 
1872393.9%
 
6536172.4%
 
7196690.9%
 
872470.3%
 
927060.1%
 
Other values (30)24510.1%
 
(Missing)98385943.8%
 
ValueCountFrequency (%) 
036841416.4%
 
1872393.9%
 
21528196.8%
 
32103439.4%
 
42201219.8%
 
ValueCountFrequency (%) 
671< 0.1%
 
561< 0.1%
 
541< 0.1%
 
532< 0.1%
 
502< 0.1%
 

Nature culture
Categorical

MISSING

Distinct27
Distinct (%)< 0.1%
Missing617126
Missing (%)27.5%
Memory size17.1 MiB
S
753946 
T
247849 
P
125548 
J
91931 
AB
85966 
Other values (22)
321425 
ValueCountFrequency (%) 
S75394633.6%
 
T24784911.0%
 
P1255485.6%
 
J919314.1%
 
AB859663.8%
 
BT737793.3%
 
L621302.8%
 
AG587882.6%
 
VI291151.3%
 
BR245791.1%
 
Other values (17)730343.3%
 
(Missing)61712627.5%
 
2020-10-07T12:15:41.933302image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-07T12:15:42.147175image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length3
Median length1
Mean length1.697312718
Min length1

Nature culture speciale
Categorical

HIGH CARDINALITY
MISSING

Distinct118
Distinct (%)0.1%
Missing2137147
Missing (%)95.2%
Memory size17.1 MiB
POTAG
26810 
PIN
9875 
PATUR
9677 
PARC
9571 
FRICH
6208 
Other values (113)
44503 
ValueCountFrequency (%) 
POTAG268101.2%
 
PIN98750.4%
 
PATUR96770.4%
 
PARC95710.4%
 
FRICH62080.3%
 
VAOC51180.2%
 
CHAT36320.2%
 
CHENE26300.1%
 
PACAG24710.1%
 
MARAI24490.1%
 
Other values (108)282031.3%
 
(Missing)213714795.2%
 
2020-10-07T12:15:42.367035image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique6 ?
Unique (%)< 0.1%
2020-10-07T12:15:42.575922image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length3
Mean length3.071866319
Min length3

Surface terrain
Real number (ℝ≥0)

MISSING
SKEWED

Distinct40949
Distinct (%)2.5%
Missing617126
Missing (%)27.5%
Infinite0
Infinite (%)0.0%
Mean2802.197914
Minimum0
Maximum1662560
Zeros56
Zeros (%)< 0.1%
Memory size17.1 MiB
2020-10-07T12:15:42.817791image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile30
Q1226
median593
Q31723
95-th percentile11710
Maximum1662560
Range1662560
Interquartile range (IQR)1497

Descriptive statistics

Standard deviation10642.79269
Coefficient of variation (CV)3.798016063
Kurtosis2411.94072
Mean2802.197914
Median Absolute Deviation (MAD)466
Skewness30.04295042
Sum4558237270
Variance113269036.3
MonotocityNot monotonic
2020-10-07T12:15:43.043648image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
500332161.5%
 
1000149810.7%
 
80048380.2%
 
60048020.2%
 
1242090.2%
 
40040850.2%
 
70039240.2%
 
1338250.2%
 
20038070.2%
 
10037530.2%
 
Other values (40939)154522568.9%
 
(Missing)61712627.5%
 
ValueCountFrequency (%) 
056< 0.1%
 
135210.2%
 
228400.1%
 
325480.1%
 
426530.1%
 
ValueCountFrequency (%) 
16625601< 0.1%
 
14203881< 0.1%
 
14115244< 0.1%
 
12502231< 0.1%
 
11877671< 0.1%
 

Interactions

2020-10-07T12:15:07.224165image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:07.735873image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:08.273566image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:08.801264image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:09.326960image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:09.857656image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:10.359370image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:10.869076image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:11.415765image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:11.922475image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:12.444176image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:12.968874image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:13.488578image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:14.031269image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:14.544970image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:15.109647image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:15.668327image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:16.197026image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:16.743712image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:17.284404image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:17.819096image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:18.371781image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:18.888485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:19.442167image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:19.968867image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-10-07T12:15:43.245530image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-10-07T12:15:43.470403image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-10-07T12:15:43.701269image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-10-07T12:15:43.939133image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-10-07T12:15:44.163025image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-10-07T12:15:23.418889image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:26.582074image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:34.032809image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T12:15:35.207137image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

df_indexNature mutationValeur fonciereCode postalCode type localSurface reelle batiNombre pieces principalesNature cultureNature culture specialeSurface terrain
00Vente37220.01000220.01.0NaNNaNNaN
11Vente185100.01000262.03.0NaNNaNNaN
22Vente185100.0100030.00.0NaNNaNNaN
33Vente209000.01160190.04.0SNaN940.0
44Vente134900.013701101.05.0SNaN490.0
55Vente192000.01340188.04.0SNaN708.0
66Vente45000.01250139.02.0SNaN631.0
77Vente45000.01250[5.0]NaNNaNLNaN120.0
88Vente65000.0100030.00.0NaNNaNNaN
99Vente65000.01000269.03.0NaNNaNNaN

Last rows

df_indexNature mutationValeur fonciereCode postalCode type localSurface reelle batiNombre pieces principalesNature cultureNature culture specialeSurface terrain
22437812535779Vente17521000.0750042100.04.0SNaN470.0
22437822535780Vente17521000.075004261.04.0SNaN470.0
22437832535782Vente17521000.075004270.03.0SNaN470.0
22437842535783Vente17521000.075004247.01.0SNaN470.0
22437852535784Vente17521000.075004255.02.0SNaN470.0
22437862535785Vente17521000.075004266.04.0SNaN470.0
22437872535786Vente17521000.0750042120.05.0SNaN470.0
22437882535788Adjudication610000.075004244.02.0NaNNaNNaN
22437892535789Vente1400000.0750024100.00.0NaNNaNNaN
22437902535790Vente1400000.075002297.03.0NaNNaNNaN